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1. INTRODUCTION 

Graphs enable us to visualize connected data and to rely on visual prowess to decipher valuable and 
important hidden knowledge, which could be used to improve the decision-making process of an 
organization. Visualizing large graphs based on the continuously growing amount of available data has 
become a very complex task and has outpaced the human’s ability to process, analyze, visualize and even, 
understand them. Therefore, a process of reducing large graphs into smaller, more representative ones is 
needed. To address this challenge, graph clustering has imposed itself lately as a promising research area. 

Graph clustering can be defined as the problem of collecting similar nodes into same groups called 
“clusters.” It is a widely known technique with applications in various fields such as social media [1], Web 
search results optimization [2], wireless sensor networks [3] and also in biochemical neural networks [4] 
among others. In most cases, the number of clusters to form is already known and is given as an entry to the 
clustering algorithm. However, with the prevalence of Big Data, it became harder to have a prior idea on the 
number of clusters. This also applies to the large graph clustering where the decider’s visual prowess is not 
sufficient enough to provide him with an approximate prior idea on the eventual number of clusters. 
Therefore, it became imperative to propose solutions where the clustering algorithm can automatically 
“guess” the correct number of clusters before proceeding with the clustering operation. This research field, 
called “Automatic Graph Clustering,” started in the late 1990’s but couldn’t blossom until the late 2000’s 
early 2010’s with the introduction of the artificial intelligence concepts such as nature-inspired 
algorithms [5], [6]. 
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In most of the papers in the literature related to automatic graph clustering based on nature-inspired 
algorithms, the main idea is to define a “Similarity Measure,” then set the clusters according to it. Several 
papers adopted a discrete formulation of this and proposed adaptations of the basically continuous nature 
inspired algorithms to solve it. 

In the present work, we will proceed differently since we will adapt the graph clustering problem 
itself so it can be represented as a continuous problem. This adaptation will be using “Bat-Cluster,” a 
combination between “FFDP,” a large graph visualization algorithm we developed by our team [7], and “Bat 
Algorithm,” a nature inspired optimization algorithm developed by Xin-She Yang [8] based on the behavior 
of bats. FFDP will set an equilibrium positioning of the large graph; then it will provide the nodes final 
positions as a vector of coordinates. Bat algorithm will take this vector into consideration and try to find the 
best clustering configuration possible. 

After reviewing the related works in Section 2, we will describe, in Section 3, the similarity measure 
we used as an objective function and we will describe how it will be optimized by the “Bat-Cluster” (BC) 
algorithm. 

The testing and results of the clustering provided by “Bat-Cluster” compared with other well-known 
solutions, such as PSO, Differential Evolution and Ant Colony Optimization, will be discussed in Section 4. 

Section 5 concludes the paper and presents an idea of our future works. 


2. RELATED WORKS 
In this section, we will explore some of the most important nature inspired solutions used to answer 
the issue of automated graph clustering before moving to introducing Bat-Cluster in Section 3. 


2.1. Particle Swarm Optimization 

The literature contains several approaches to using PSO in graph clustering, often referred to as 
“Community Detection.” Most of these approaches are based on the idea of adapting the PSO, an algorithm 
originally designed to solve continuous optimization problems so that it would be able to solve discrete 
problems. Cai et al. proposed in [9] and [10] an alteration of the definition of the position and the velocity 
terms where the position vector represents a partition of a signed network and the velocity represents an 
eventual permutation of the partition. Suganthi and Rajagopalan [11] have applied PSO in its continuous 
state, but they suggested using a multiple population swarm instead of using the standard PSO with one 
population. Rejina Parvin and Vasanthanayaki [12] used PSO to prevent residual nodes in wireless sensor 
networks (nodes that don’t belong to any cluster). Their idea has been applied to optimize energy 
consumption, throughput, packet delivery ratio, and network lifetime of the wireless sensor networks. 


2.2. Ant Colony Optimization 

Mandala et al. [13] proposed an ACO based technique for graph clustering and applied it in 
detecting customer communities in the e-marketing field. Ji et al. [14] suggested a solution for the problem of 
complex community detection in large graphs based on the strategy of ant pheromone diffusion and update to 
search for an optimal graph partitioning. Zhou et al. [15] followed a similar process, but they took the 
overlapping issue of the large communities into consideration. Moradi and Rostami [16] used ACO along 
with feature selection to define clusters of features. Gao et al. [17] proposed a combination between ACO 
and K-Means as a solution to the dynamic location routing problem. K-Means is used to define the location 
of depots (cluster centers) while ACO is utilized to handle the VRP in dynamic environments. 


2.3. Differential Evolution 

Paterlini et al. [18] proposed a direct application of DE to solve the problem of graph partitioning 
and a comparative study with the Genetic Algorithm (GA) showed that DE was more efficient. Cai et al. [19] 
proposed an adaptation of DE inspired by the imitation of the phenomenon of social learning in animal 
societies. They improved the traditional DE by introducing the strategic ASL selection. It allows the 
algorithm to rely on the information extracted from the neighborhood relationships of its population 
individuals to guide the selection of the eligible parents for the crossover. Hybridization attempts of DE with 
other algorithms can be found in recent literature. For instance, Zorarpaci and Ozil [20] suggested a 
combination between DE and the Artificial Bee Colony algorithm and applied it to solve the problem of 
feature selection. 
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3. PROPOSED SOLUTION : “BAT-CLUSTER” 
3.1. Objective Function 

The objective function for the algorithm is the quality measure that will help it decide what 
clustering configuration is the best. Nanda and Panda [21] provided a list of several clustering quality metrics 
available in the literature. What we want is a clustering able to highlight, on the one hand, the closeness 
between similar nodes, and on the other hand, the separation between different nodes. Therefore, the distance 
should have a fundamental role in choosing our quality metric. However, relying on the distance from the 
cluster center alone as in the traditional K-Means, or the distance between cluster centers may not be 
sufficient. 

We need a metric able to provide a combination of these two metrics so that it would assure that the 
similar nodes are close to each other and far from the nodes that are different from them. 

One of the most popular metrics in the literature is called “DBIndex” [22]. It was developed by 
Davies and Bouldin, and it provides a ratio between the intra-cluster distance (the distance between the nodes 
in the same cluster) and the inter-cluster distance (the distance between the centers of each cluster). 

DBIndex is defined as: 
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d(x, y) is the Euclidian distance between x and y. 
C, is the cluster i. 


&; is the center of the cluster i. 


cl is the norm of C,. 


According to Davies and Bouldin [22], a correct clustering minimizes the DBIndex as depicted in 
Equation (1). That being said, the objective function for our clustering algorithm should be: 
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To solve Equation (2), we propose a hybridization of the standard Bat Algorithm by Yang [8] with 
the FFDP algorithm that we developed in a previous work [7]. We chose to call this hybridization “Bat- 
Cluster,” or BC. 


3.2. Bat-Cluster 

Bat-Cluster, or BC, is a combination of two algorithms, FFDP and Bat Algorithm. FFDP will run 
first to set an optimized equilibrium positioning of the nodes of the graph. These node positions will then be 
assigned to the Bat Algorithm. 

BA will start by generating a population of bats. Each of these bats will have its own initial 
loudness, pulse rate, position and velocity. The initial bats positions represent the initial cluster centers. 
When the algorithm starts running, each bat will be assigned to a cluster center location. For each cluster 


Int J Elec & Comp Eng, Vol. 8, No. 2, April 2018 : 1122 — 1130 


Int J Elec & Comp Eng ISSN: 2088-8708 O 1125 


center, the algorithm will calculate the mean value of the closest nodes to it. The cluster center’s position is 
then updated, and the objective function is calculated as in the Equation (2). If the value of the objective 
function has converged, we return the cluster center locations; otherwise, we reassign each bat to the 
corresponding cluster center once again. 


If the random value rand is greater than the bats pulse rate, the algorithm selects a solution among 
the best solutions and generates a local solution around the selected best solution. If the random value rand 


is smaller than the loudness A, , and the value of the objective function for the current bat position is better 


(smaller in our case) than the value of the best solution found so far, f (x,) < f(x), the new solution is 
accepted, the bats pulse rate is increased, and the loudness is decreased. The solutions found are sorted, and 
the current best solution is stored. 

The algorithm keeps running until the stop criterion is respected. In our case, the algorithm should 
stop if the iteration number f becomes equal to the maximum number of iterations M . 

The pseudo code of the Bat-Cluster algorithm will be then described as follows: 


Algorithm BatCluster( G , X ,tol,K,M,N, A°,r°) 


input: Graph G , Nodes initial positions X , tolerance tol , nominal edge length K , Maximum iterations number M , 
0 0 
Bats total population N , initial loudness À and initial pulse rate F 
// Set the graph’s positioning 
coords = FFDP(G,X,tol,K)  //coords is 2-D or 3-D vector 
/*Tnitialize the bats positions X, , velocities V, and frequencies Í: */ 


For all bats Í do { 


A; = À, 


=H 
f,=9 
v, =0 
x, = rand 


/* Calculate the solution of the objective function for the bat i according to the equation (2) */ 


DB(x,, coords) 


// Select the best solution 


While (t < M ) 


/* Generate a new solution by adjusting the frequencies of the bats according to the equation (4), updating the 
velocities and the positions of the bats according to the equations (5) and (6) */ 


For all bats i do { 


Í = frin + (Snax = foin) 


v =v tx, 


4 


x = Ae i 
(rand > 7) { 
/* Generate a new solution around the best solution according to the equation (7) */ 


x =x, +6A' 


new old 


} 
} 


If (rand < A, and DB(x,,coords) < DB(x ,coords)) { 


//Accept the new solutions 
* 
X =X; 
/* Update A, and F, according to equations (8) and (9) */ 


At = aA! 
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// Select the current best solution 
t=t+1 
} 


4. TESTING AND RESULTS 
4.1. Testing Environment 

For simulation purposes, we ran the Bat-Cluster algorithm in a computer with the following 
technical characteristics: 
Environment: Java 8 + Matlab R2017a 
OS: Windows 7 
CPU: Intel i5 2450M 2.5Ghz 
RAM: 4Gb 
To test the performances of BC, we put it in comparison with three distinguished algorithms: 
Particle Swarm Optimization 
Ant Colony Optimization 
. Differential Evolution 

We will use the continuous aspect of all these algorithms, and the function to optimize will be the 
DBIndex as depicted in the Equation (2). This approach will enable us to compare the performances of these 
algorithms on an equal foot. 


aoe 


ga rho 


4.2. Benchmark Graphs 

The graphs that we will use in our tests are three benchmark graphs of different sizes and come from 
different domains. These graphs are available in the Gephi standard dataset accessible in the following link: 

https://github.com/medialab/benchmarkForceAtlas2/blob/master/dataset.zip (last checked: 
December 21", 2017). 

The 4 graphs are: 

a. facebook_ego_686 (168 nodes and 1656 links) 

b. yeast (2361 nodes and 7182 links) 

c. arxiv_general_relativity (5242 nodes and 28980 links) 

d. oregon2_010331 (10900 nodes and 31180 links) 

Table | displays the layouts of the 4 benchmark graphs. 


Table 1. The Benchmark Graphs Layouts 
facebook_ego_686 yeast 


arxiv_general_relativity oregon2_010331 
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4.3. Parameters Setting 
Defining the correct parameters for a nature inspired algorithm, in general, requires rigorous prior 
testing. The same goes for BC and all of the algorithms that we will test it against. 
After several experiments, the parameters we found able to answer our needs correctly are the following: 


a. Bat-Cluster: Maximum Iterations Number M = 200, Bat Population Size N =50, Initial 
Loudness À, = 1.1, Initial Pulse Rate 7, = 0.5 

b. Particle Swarm Optimization: Maximum Iterations Number M = 200, Particle Population Size 
N =50, Inertia Weight w=1, Inertia Weight Damping Ratio wdamp =0.99, Personal 


Learning Coefficient c, = 1.5 , Global Learning Coefficient €, = 2 


c. Ant Colony Optimization: Maximum Iterations Number M =200, Ant Population Size 
N =10, Sample Size nSample = 40, Intensification Factor q =0.5, Deviation Distance 
Ratio € =1 

d. Differential Evolution: Maximum Iterations Number M = 200, Individual Population Size 
N =50, Crossover Probability pCR = 0.2 


4.3. Experimental Results 
The Table 2 to Table 5 show the performances of the Bat-Cluster compared with each of the other 
aforesaid algorithms on the four benchmark graphs. 


Table 2. facebook_ego_686 


Algorithm Number of DBIndex Values 
Clusters 
BC 4 0,72657 
PSO 3 0,72731 
ACO 3 0,80209 
DE 3 0,73884 


In “facebook ego 686”, Bat-Cluster provided the smallest optimal value for the DBIndex, closely 
seconded by PSO. Yet, BC was the only algorithm able to provide three clusters while the other algorithms 
provided only 3 clusters. 


Table 3. yeast 


Algorithm Number of DBIndex Values 
Clusters 
BC 4 0,69332 
PSO 5 0,69423 
ACO 3 0,81152 
DE 3 0,71968 


The results in the “yeast” graph can be debatable at first. Indeed, based on the DBIndex alone, we 
will say that BC was the best, but seeing that PSO was able to provide more clusters can open the possibility 
that PSO may be able to find a better value for the DBIndex. However, according to the evolution of the best 
values as depicted in Figure 1, we will see that PSO started stagnating after the iteration 100 in a DBIndex 
value higher than the one provided by BC. This concludes to the fact that having 5 clusters may not be the 
best clustering scenario. 


Table 4. arxiv_general_relativity 


Algorithm Number of DBIndex Values 
Clusters 
BC 6 0,74797 
PSO 4 0,7558 
ACO 3 0,81471 
DE 3 0,76049 
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Best Cost 


Iteration 


Figure 1. The evolution of the best DBIndex values provided by PSO for the “yeast” graph 


In the “arxiv general relativity” graph, BC gave the smallest values of DBIndex and much more 
clusters (6 against 4 provided by the first runner-up PSO). 


Table 5. oregon2_010331 


Algorithm Number of DBIndex Values 
Clusters 
BC 3 0,58093 
PSO 3 0,58101 
ACO 2 0,71037 
DE 2 0,62902 


Regarding the “oregon2 010331” graph, BC and PSO were able to provide 3 clusters, while the 
other two could only provide 2 clusters. The DBIndex values of BC and PSO were very close, with a small 
advantage for BC. 

Overall, Bat-Cluster was able to provide the best values of DBIndex on all the benchmark graphs. 
Being closely seconded by PSO shows the ability of the Swarm Optimization algorithms to tackle this kind 
of problems. However, the results provided by ACO were poorer than expected. When we look at the 
evolution of the best value provided by ACO on “facebook ego 686,” as in Figure 2, for example, we can 
see that the algorithm kept finding better values. This can be explained by the fact that the configuration we 
gave to ACO may probably not be the best. 


Figure 2. The evolution of the best DBIndex values provided by ACO for the “facebook_ego_686” graph 
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5. CONCLUSION 

This paper presented the Bat-Cluster (BC) algorithm. It is a combination of the FFDP algorithm 
developed by our team [7] and the Bat Algorithm developed by Xin-She Yang[8]. BC is an algorithm 
designed to answer the need for automated large graph clustering. In contrast with several clustering 
algorithms available in the literature, BC was able to translate the automated large graph clustering issue into 
a continuous problem, while the other solutions tend to formulate it as a discrete problem. The idea here was 
to run a large graph layout algorithm, the FFDP, and make it provide the coordinates of the equilibrium 
positions of the graph’s nodes. Having these coordinates enabled us to translate the graph to a standard real 
valued vector easily solvable with the continuous version of the Bat Algorithm. The quality metric we used to 
measure the quality of our clustering was the DBIndex by Davies and Bouldin [22]. The Bat-Cluster 
algorithm was tested on four benchmark graphs of different sizes and from different domains. BC proved to 
be a good alternative solution to solve the automated large graph clustering problem when compared to 
algorithms considered among the best in the literature. 

The Bat-Cluster algorithm will be integrated into XEWGraph [23], the large graph visualization 
service of the Competitive Intelligence tool Xplor EveryWhere [24]. Coupled with the out of the box 
categorization provided by XEWGraph’s hypergraph approach, BC will enable the user to have large graphs 
clustered and expanded on demand for both the web and the mobile oriented interfaces of XEWGraph. 
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